# Multimodal Image-Text Understanding

Open Qwen2VL
CC
Open-Qwen2VL is a multimodal model capable of receiving both images and text as input and generating text output.
Image-to-Text English
O
weizhiwang
568
15
Qwen.qwen2.5 VL 3B Instruct GGUF
Qwen2.5-VL-3B-Instruct is a 3B-parameter vision-language model that supports image-to-text generation tasks.
Image-to-Text
Q
DevQuasar
1,107
3
Qwen.qwen2.5 VL 7B Instruct GGUF
Qwen2.5-VL-7B-Instruct is a 7B-parameter multimodal vision-language model that supports joint understanding and generation tasks for images and text.
Image-to-Text
Q
DevQuasar
2,225
1
Qwen2.5 VL 3B Instruct GPTQ Int3
Apache-2.0
The GPTQ-Int3 quantized version of Qwen2.5-VL-3B-Instruct, suitable for multimodal image-text processing tasks with reduced VRAM usage and faster inference speed.
Image-to-Text Transformers Supports Multiple Languages
Q
hfl
60
1
Qwen2.5 VL 7B Instruct GPTQ Int3
Apache-2.0
This is an unofficial GPTQ-Int3 quantized version based on the Qwen2.5-VL-7B-Instruct model, suitable for multimodal image-text-to-text tasks.
Image-to-Text Transformers Supports Multiple Languages
Q
hfl
577
1
Paligemma2 3b Mix 224 Jax
PaliGemma 2 is an upgraded vision-language model based on Gemma 2, supporting multilingual image-text input and text output, specifically designed for vision-language tasks
Text-to-Image
P
google
38
1
Paligemma2 10b Mix 448
PaliGemma 2 is a vision-language model based on Gemma 2, supporting image and text inputs to generate text outputs, suitable for various vision-language tasks.
Image-to-Text Transformers
P
google
31.63k
25
Llava 1.6 Gguf
Apache-2.0
LLaVA-1.6 is an open-source vision-language model that supports image-text-to-text tasks, with improved visual understanding and text generation capabilities.
Image-to-Text
L
cmp-nct
1,735
75
Image Caption Large Copy
Bsd-3-clause
BLIP is an advanced vision-language pretraining model, excelling in image captioning tasks by effectively utilizing web data through guided annotation strategies
Image-to-Text Transformers
I
Sof22
1,042
10
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase